Decision Tree Induction: How Effective is the Greedy Heuristic?

نویسندگان

  • Sreerama K. Murthy
  • Steven Salzberg
چکیده

Most existing decision tree systems use a greedy approach to induce trees -locally optimal splits are induced at every node of the tree. Although the greedy approach is suboptimal, it is believed to produce reasonably good trees. In the current work, we attempt to verify this belief. We quantify the goodness of greedy tree induction empirically, using the popular decision tree algorithms, C4.5 and CART. We induce decision trees on thousands of synthetic data sets and compare them to the corresponding optimal trees, which in turn are found using a novel map coloring idea. We measure the effect on greedy induction of variables such as the underlying concept complexity, training set size, noise and dimensionality. Our experiments show, among other things, that the expected classification cost of a greedily induced tree is consistently very close to that of the optimal tree.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Decision Tree Induction : How E ective is the Greedy Heuristic ?

Most existing decision tree systems use a greedy approach to induce trees | locally optimal splits are induced at every node of the tree. Although the greedy approach is suboptimal, it is believed to produce reasonably good trees. In the current work, we attempt to verify this belief. We quantify the goodness of greedy tree induction empirically, using the popular decision tree algorithms, C4.5...

متن کامل

Decision Tree Induction Systems: A Bayesian Analysis

Decision tree induction systems are being used for knowledge acquisition. Yet they have been developed without proper regard for the subjective Bayesian theory of inductive inference. This paper examines the problem tackled by these systems from the Bayesian view in order to interpret the systems and the heuristic methods they use. It is shown that decision tree systems depart from the usual Ba...

متن کامل

A New Decision Tree Induction Using Composite Splitting Criterion

C4.5 algorithm is the most widely used algorithm in the decision trees so far and obviously the most popular heuristic function is gain ratio. This heuristic function has a serious disadvantage – towards dealing with irrelevant featured data sources. The hill climbing is a machine learning technique used in searching. It has good searching mechanism. Considering the relationship between hill cl...

متن کامل

Finding Optimal Multi-Splits for Numerical Attributes in Decision Tree Learning

Handling continuous attribute ranges remains a deeciency of top-down induction of decision trees. They require special treatment and do not t the learning scheme as well as one could hope for. Nevertheless, they are common in practical tasks and, therefore, need to be taken into account. This topic has attracted abundant attention in recent years. In particular , Fayyad and Irani showed how opt...

متن کامل

Avoiding the Look-Ahead Pathology of Decision Tree Learning

Most decision-tree induction algorithms are using a local greedy strategy, where a leaf is always split on the best attribute according to a given attribute selection criterion. A more accurate model could possibly be found by looking ahead for alternative subtrees. However, some researchers argue that the look-ahead should not be used due to a negative effect (called ―decision tree pathology‖)...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995